Skip to content

Ge/19 #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Ge/19 #20

wants to merge 5 commits into from

Conversation

gegnew
Copy link
Contributor

@gegnew gegnew commented Nov 1, 2019

Also lots of formatting.

@gegnew
Copy link
Contributor Author

gegnew commented Nov 2, 2019

I need to resolve the namespace of the by_name function, for two reasons:

  1. Need to have an accessible cache_clear method
  2. Need to make sure that duplicate names representing different objects do not conflict (i.e. the example in the R toolkit)

@gegnew gegnew marked this pull request as ready for review November 4, 2019 12:20
@gegnew
Copy link
Contributor Author

gegnew commented Nov 4, 2019

The LRU cache (which currently has a 2048 item limit, which can be made unlimited) works for different objects (like files or gates) inside different experiments with the same name. There are accessors to the cache_clear and cache_info methods in the base module.

@gegnew gegnew force-pushed the ge/19 branch 2 times, most recently from d5fcb84 to 58d32e7 Compare November 4, 2019 15:37
Copy link
Contributor

@JobLeonard JobLeonard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So aside from a few PEP8 related nits, I'm wondering if it wouldn make the code easier to use if we take that massive list of optional parameter options for gate creation:

    label=[], gid=None, locked=False,
    parent_population_id=None, parent_population=None,
    tailored_per_file=False, fcs_file_id=None,
    fcs_file=None, create_population=True

... and make a little class for that (using __slots__ of course, for better performance)

class GateOptions:
    __slots__ ="label", "gid", "locked", "parent_population_id",
            "parent_population", "tailored_per_file", "fcs_file_id",
            "fcs_file", "create_population"

    def __init__(self, label=None, gid=None, locked=False,
                parent_population_id=None, parent_population=None,
                tailored_per_file=False, fcs_file_id=None,
                fcs_file=None, create_population=True):
        if label is None:
            label = []
        self.label = label
        self.gid = gid
        self.locked = locked
        self.parent_population_id = parent_population_id
        self.parent_population = parent_population
        self.tailored_per_file = tailored_per_file
        self.fcs_file_id = fcs_file_id
        self.fcs_file = fcs_file
        self.create_population = create_population

.. then you can replace those parameters with something like gateOpts=None in the parameters, and:

if gateOpts is None:
    gateOpts = GateOptions()

... to initialize the default values in the function (which you can then pass on to subsequent calls as well).

The downside would be that when the user would want to pass these options, they would have to write:

create_range_gate(experiment_id, x_channel, name, x1, x2, y, GateOptions(...))

instead of:

create_range_gate(experiment_id, x_channel, name, x1, x2, y, ...)

OTOH, maybe they'll just end up creating a few default options objects and re-use them.

README.md Outdated
Comment on lines 20 to 27
##Developer Notes
- `id` is a python builtin, which causes some confusion. We use `_id` to indicate
the ID of an API object, but the `attrs` package does not accept leading
underscores (i.e. an `_id = attr.ib()` in a class is treated by `attrs` as the
string "id"). Practically, this means:
- pass `_id` to functions that take an ID as an argument.
- pass `id` to `attrs` classes when instantiating them.
- pass `properties` to `attrs` classes when instantiating them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python, leading underscores are used to signal private properties to programmers. The Naming Styles section of PEP 8 says this:

_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose names start with an underscore.

single_trailing_underscore_: used by convention to avoid conflicts with Python keyword, e.g.

    Tkinter.Toplevel(master, class_='ClassName')

Let's replace _id with id_ and see if it also fixes the attrs interop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's gonna be a messy hassle to convert back and forth since our API uses _id... If the only known bug/limitation from using _xxx is due to attrs, I'd consider dropping that dependency first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, keep in mind that these kind of interop issues are likely happen with other Python libraries as well, because the entire Python ecosystem expects this convention.

So from that angle the argument becomes one of weighing how important it is to be consistent between Python/R/JavaScript, and within each language.

Are Python users likely to deal with the other APIs, or more likely to stick to the Python environment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish MongoDB didn't use _id...

It sounds like a single leading underscore is only a convention and hint. I'm curious why attrs breaks. I skimmed the source and searched the docs and only find this bit (ctrl-f "leading underscores") http://www.attrs.org/en/stable/examples.html or (ditto) http://www.attrs.org/en/stable/api.html. Also python-attrs/attrs#391

Is there at least some way in Python to automatically do the conversion so we never have code like req_body = {_id: self.id_}? e.g. an _id getter?

All of our API documentation uses _id (which is factual). To the extent possible, I'd like to keep consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we already have marshalling for our API's camelCase to Python's snake_case, so maybe I shouldn't worry.

Copy link
Contributor

@JobLeonard JobLeonard Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like a single leading underscore is only a convention and hint.

Well, to quote PEP8 again: "E.g. from M import * does not import objects whose names start with an underscore", and import is a language-level keyword so it's more than a convention I'm afraid. It's more like a language feature that is not enforced by the language, because dynamic typing and monkey patching and whatnot (no, I'm not a fan either)

I'm curious why attrs breaks. I skimmed the source and searched the docs and only find this bit (ctrl-f "leading underscores") http://www.attrs.org/en/stable/examples.html or (ditto) http://www.attrs.org/en/stable/api.html. Also python-attrs/attrs#391

If I read those docs correctly, the only real problem here is that keyword arguments don't accept leading underscores. So calling SomeGatingClass(_id=1234) would fail, but SomeGatingClass(id=1234) would work fine.

I suppose attrs does this because the very idea of private properties means that they Should Not Be Parameter Names (that is, I'm passing some property x that is stored privately as _x).

Is there at least some way in Python to automatically do the conversion so we never have code like req_body = {_id: self.id_}? e.g. an _id getter?

I think @property decorator is your friend, although I am not entirely sure if it covers your use-case:

@property
_id(self):
    return self.id_

@_id.setter
def _id(self, value):
    self.id_ = value

@_id.deleter
def _id(self):
    del self.id_

Copy link
Contributor Author

@gegnew gegnew Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the README to be a little more clear: basically, a user should always pass _id and never worry about this naming convention otherwise. Internally, everything should be named _id, but it is possible that you might have to pass id as an init arg somewhere. The only case I can find of this right now is actally the _properties arg in loader.Loader().make_class. There, properties is passed to a given <classname>, which contains a _properties attribute.

Otherwise, regarding this branch, I understand from this conversation that we should:

  1. keep _id as it is
  2. not use a GateOptions class for gate options

In addition, I have made all the main data classes slots classes in the next branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me but it's Zach's final call.

Nice @ slots, if I understand the use-case for this lib correctly I doubt we'll be crunching through hundreds of thousands of instances, but keeping things lean still feels good :)

@zbjornson
Copy link
Member

The downside would be that when the user would want to pass these options, they would have to write:

create_range_gate(experiment_id, x_channel, name, x1, x2, y, GateOptions(...))

I think I'd avoid GateOptions for that reason. Could you have a function like initializeGateDefaults(self) though, as far as deduping default init?

@JobLeonard
Copy link
Contributor

I think I'd avoid GateOptions for that reason. Could you have a function like initializeGateDefaults(self) though, as far as deduping default init?

These are arguments in top-level functions like create_polygon_gate, create_range_gate and so on, not classes, so that won't work. And I think Gerrit already refactored out the common code into common_gate_create (tangent: I just noticed that all of these functions follow the naming convention create_xxx_gate, except common_gate_create. Maybe make that create_common_gate for the sake of consistency? Or are those names also modelled after the R toolkit?)

If I understand the code correctly there's one generic Gate class, plus a bunch of functions that help with initializing a Gate object with the specific settings for each gate type.

These create_xxx_gate functions all have default values though, so they don't require users to pass all of them, and Python lets you use keyword arguments to skip the parameters you don't care about IIRC - so you could do create_quadrant_gate(experiment_id, x_channel, y_channel, name, x, y, fcs_file_id=<somefileid>) and skip all the parameters before fcs_file_id. So might actually be ok to use in practice I guess - but 16 parameter functions just make me do a double take.

@gegnew gegnew force-pushed the ge/19 branch 2 times, most recently from 7209121 to 9eb8c57 Compare November 6, 2019 15:43
… methods

Population resources complete, child_gate model func written

partial method for creating complex gates

added complex population creator; needs tests

create_complex_population tested

split data objects from their methods

methods + tests creating child gates

added update and delete methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants